
Table of Contents
1.
Purpose
2. Background
3. Data Analysis
3.1. Demographic Analysis
3.2. Time Series Analysis
3.3. Geographic Analysis
3.4. Year-Over-Year Analysis
4. Conclusion
5. References
1. Purpose
This report
analyses City Bike’s data from New York city, for the month of January
2020. We have attempted to analyse the data from various perspectives
such as Demography, Time-Series, Geography, etc. To do so, we have
created visualizations by using different types of plots such as
Leaflets, Network Diagrams, Line Graph, Bar Chart, Donut Graph, Scatter
Plot and Pi Chart. Some of these visualizations are dynamic, helping the
reader interact with data and fetch valuable insights from the same.
Additionally, we have also provided our inferences and suggestions based
on our comprehension about the data.
2. Background
Citi Bike is the
USA’s largest and privately operated public bicycle sharing program
serving the New York City boroughs of the Bronx, Brooklyn, Manhattan,
and Queens, as well as Jersey City, New Jersey, and Hoboken, New Jersey.
The dataset used here for analyzing has been retrieved from Citi Bike’s
website which contains real time data for the bicycle service.
From this data,
we are trying to inspect the data from different viewpoints and discover
patterns and inferences from it. This report has touched upon various
aspects of data and answers questions like; Who are the customers of
Citi Bike?, How does factors such as Time, Weather and Day affect the
sales? Which geographic location is more profitable than others?, How
has business been doing for the past three years?, etc.
Moreover, we
have taken data from Jan 2020 to 2022 to understand the company’s
business during pre-pandemic, pademic and post-pandemic
situation.
3. Data Analysis
The Data
Analysis is done in four sections i.e. Demographic, Time-Series,
Geographic and Year-Over-Year Analysis
3.1. Demographic
Analysis
To Analyse the
demographical trends in data for Jan 2020, we have plotted the following
graphs. The Pi chart shows consumer segments based on User Type
i.e. Customers and Subscribers. Customers are one time users who can
rent a Citi Bike from a rental location without a membership.
Subscribers, as the name suggests, holds membership which can be annual,
monthly or casual. The donut chart provides us a gender-based
segregation of customers and subscribers. The stacked-bar graph provides
an overview of customer based on their age-groups and gender.

The above
visualizations clearly states that a major chunk of Citi Bike’s Revenue
is generated through subscriptions. Data provided on Citi Bike’s website
suggest revenue of approximately $0.97M for the month of Jan 2020 just
through subscriptions. It can also be inferred that, 75% of the total
users are male subscribers. R&D can be done and appropriate
strategies can be established to promote Citi Bike’s to Female
Customers. Looking at the Bar plot, we can see that the users below the
age of 20 are negligible. Citi Bikes can potentially look into the
opportunity to invest into rental bikes for kids and young
adults.
The next two
line graphs provide an overview of number of bikes used per day and
categorize it based on user type and gender. The third graph was created
using external data to find the co-relation between bikes used and
temperature on that particular day

From the first
graph it can be concluded that there is no significant co-relation
between the day of the week and bikes used by subscribers. However,
their is a considerable amount of increase in the number of bikes used
by customers on weekends. Therefore, for the fraction of people who only
uses Citi Bikes on weekends and do not have a yearly membership, a
weekend pass would be a good idea to implement. The second line graph
shows that females use less bikes as compared to men but the overall
usage trend for both genders remain the same. The number of bikes used
per day also varied with temperature. The third graph implies that
whenever the temperature is low, bike usage for that day goes down as
well.
3.2. Time-Series
Analysis
The Time series
graphs help us analyse data patterns using data collected over a
specific time interval. The first graph helps us understand which hour
of the day is most preferred by customers for renting bicycles. The
second graph is an interactive graph that shows the average time period
for which the bike was used. This graph can we views as either a Scatter
Plot or a Bar Graph.
We can see that
the first graph shows a major spike in the number of bikes used during
the start and end of office hours i.e. 8 a.m. to 6 p.m on weekdays.
Solutions can be provided to boost the usage of bikes at other times of
the weekday without affecting the current peak usage times. Majority of
people use bikes for 0-30 mins which will lead to more number of bikes
being frequently available for customers. More schemes can be made by
partnering with the government to boost tourism during the office hours
which can generate more traffic during the office hours.
3.3. Geographic
Analysis
Geographic
Analysis helps find Geographic patterns using data. The below given
leaflets provide a geographic representation of number of bikes picked
and dropped per station along with its location. The bike locations have
been color coded with color radiation to reflect the total no. of bikes
picked/dropped from that station.
Click on the graph to get
the details of station and no. of bikes picked
Fig 9: No. of Bikes Picked Per Station.
Fig 10: No. of Bikes Dropped Per Station.
As we can infer
from the above leaflets, Manhattan area has the highest number of
pickups and drops on most of its stations. Research can be done to
capitalize this demand by understanding user activity. Some possible
solutions can be to establish more stations within close proximity of
the existing ones or increase the number of racks per station
We have used a
network graph to display the frequently used routes to understand the
flow of traffic. This will help Citi Bike innovate new ideas to improve
services in these areas.
Hover over and drag a
node(stations) to highlight its most connected routes.
Fig 11: Route Tracking and Density
The network
above can be used to analyse the amount of bikes to be stationed at a
particular location so that no station has empty stands during peak
hours. We can also understand customer movement or their demand to drop
a bike at a near-by location where there aren’t any Citi Bike station
yet, which will help expand the current network.
3.4. Year-Over-Year
Analysis
Using additional
datasets for Jan 2021 and Jan 2022, we have plotted a grouped bar chart
to compare the total number of bikes used for the same month across
three years.
The graph shows
an overall decrease in the number of Citi Bikes used in January 2022
over the previous years. This might be a result of the Covid-19 pandemic
or the Climate Change over these years. The trend can be expected to
change this year as the pandemic has subceeded.
4. Conclusion
The Citi Bike
data examined and visualized in the above sections has helped us gain
insights on the general trends and areas of improvements based on the
inferences made after examining those trends. One of the major area of
improvement can be understand why Citi Bikes are not being used by women
- is the bike design one of the reason behind it? Another area of
improvement is to introduce Citi Bikes for kids and young adults. Citi
Bike can also look at different ideas to boost use of bikes for longer
duration during office hours. Introduction of a weekend pass is another
scheme that can be proposed to boost use of bikes on weekends.
Increasing the number of docks at a station and introducing new stations
can be considered as a viable option based on the geographical analysis
conducted above. Lastly the year-on-year comparison shows a dip int the
total no. of users in the recent years. Covid-19 could be one of the
reasons but, detailed examination should be done to understand the
root-cause as a preventive business measure.
5. References